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(57) To support speculative execution In a proces- 
sor, a speculative look aside tabUe (80) stores informa- 
tion about deferred exceptions. Lat>e)s (60) attached to 
predicates in the predicate register file (50) of the proc- 
essor serve as indices to entires in the speculative look 
aside iabUe (80). When an exception is generated for a 
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speculative operation, the speculative look aside table 
(80) is updated. Deferred exceptions are detected and 
handled when the processor reads the conresponding 
entry in the speculative kx)k aside tat>le during an 
explicit or implicit check operation. 
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Description 
TECHNICAL HELD 

5 The invention relates to instruction level parallelism in a processor, and more specifically relates to an improved 
technique for speculative execution. 

BACKGROUND OF THE INVENTION 

10 In the field of computer processor design, developers are always looking for ways to increase the rate at which the 
processor executes instructions. To accomplish this goal, the processor can be designed to execute several operations 
at once, or the cyde time of the processor can t>e reduced. One type of processor, referred to as a superscalar proces- 
sor, irx^ludes special hardware to identify operations in the instruction stream that can be executed simultaneously. 
Unfortunately, the corrplexity of this hardware makes it difficult to reduce the cyde time. 

15 Another type of processor, referred to as superparallel or Very Long Instruction Word (VLIW), relies on the compiler 
to schedule operations in bundles that can be executed in parallel. Since the hardware is simpler than in superscalar 
processors, the cycle time can be reduced further. 

One problem vtntii VLIW processors, however, is that there often are not enough independent operations to keep 
the hardware resources busy. The phrase convnonly used to refer to the extent to which operations can be executed in 

20 parallel is "Instruction Level Parallelism.'* Programs executed on VLIW processors are typically optimized to improve 
instruction level parallelism. TTiis optimization can be performed in the conpiler, in the hardware, by hand, or using 
some combination of these techniques. 

Speculative code motion is a form of optimization that can Improve instruction level parallelism. In general, it 
involves nxjving an operation across a conditional t>ranch that controls its execution. In speculative code motion, one 

25 or more operations are nrmed from their home basic k)lock to a previous basic block in the program. A l)asic block" is 
a straight line sequence of operations followed by a branch. The home block is the basic block in which the speculative 
operation originally resides in the program. The previous basic blocks for a given basic block include all the basic t)locks 
that can branch to the given basic block or that sequentially precede tiie basic block. 

An operation moved in this manner is refen-ed to as "speculative" or "anticipatory" because it is executed before it 

30 is known whether the operation will t>e used in tiie program. The result of a speculative operation may never t>e used 
t>ecause a conditional branch that leads to the home block of the operation may take a different path. 

While speculative code motion can improve the performance of VLIW processors, a prot^em can arise when a 
speculative operation generates a fault Consider, for example the following source code: 
if (A 1= 0) B =: *A 

35 A non-speculative verston of this code would be: 



. - . (some code here) 

branch to instruction X if register A holds a 0 
load register B from the address in register A 

iX • • ■ • 



45 The speculative version of this code would be: 

load register C speculatively from the address in 
register A 

. . • (some code here) 
^ branch to instruction X if register A holds a 0 

copy the contents of register C to register B 
X: . . . 



In this example, the speculative code motion improves the instruction level parallelism, and has the additional ben- 
efit of redudng the impact of the latency incuned in the load operation. However, a speculative operation may generate 
a fault even if the result of ttie operation is never used in the program. For instance in this example, the speculative load 
operation may generate a fault when register A holds a zero. If a speculative operation generates a fault, it shouM not 
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be reported or processed immediately. Instead, processing of the fault should be deferred until it Is known that the result 
of the operation will actually be used in the program. This point is sometimes referred to as the commit point, the point 
where use of the operation can cause a change in state of the computer. 

There are a number of possit>le approaches to deal with exceptions generated during speculative execution. One 

5 conservative approach is referred to as "safe speculation.** In this approach, only operations that do not generate excep- 
tions are moved speculatively. This approach does not improve instruction level parallelism sufficiently because it pre- 
cludes speculative motion of many operations. Moreover, it does not allow load operations to t>e executed speculatively, 
and therefore, does not have the benefit of hiding memory latency. 

Another alternative approach is referred to as boosting. In this approach, a speculative operation is tagged with the 

10 path back to its home t>asic block. To defer an exception, this state information must be saved until the processor takes 
a different execution path or it uses the result of the operation in a non-speculative operation. 

The need to save this stare information is a drawback of the txx>sting technique Additional memory is required to 
store this State information. This gives rise to a trade off between the extent to which boosting can be achieved and the 
additional op code bits required to store the branch directions. The numt^er of branches that an operation can be nrK>ved 

75 across is limited by tiie memory available to store the state information. 

Another approach involves tine use of a poison bit to defer exceptions. In this approach, the processor marks the 
result register of a speculative operation witii a poison bit when an exception has been generated. When another spec- 
ulative operation uses the result of this operation, tiie processor can propagate the exception by setting a poison bit in 
the result register of tiie operation. Processing of the exception is deferred until a non-speculative operation consumes 

20 the poison bit At that point, the processor can report or process the exception. 

The po^n bit approach typically requires tiiat an extra bit be added to the op code of speculative operations in the 
instruction set architecture to differentiate between speculative and non-speculative operations. This Is a drawback 
because it increases the complexity of the instruction set and requires additional memory In the register file. In addition, 
the poison bit must be saved when a register is spilled at a function call or context switch. It is difficult to save the poison 

25 bit because a register that holds 64 t>i1s of data, for example, needs to be spilled to 65 bits of memory. 

Yet another approach is referred to as tagging. In ttiis approach, each operation has a tag associated with it. Typi- 
cally, a tag of zero indicates that the operation is non-speculativa For speculative operations, the tag refers to memory 
in the processor such as a tag table that stores information about deferred exceptions. In this scheme, a commit oper- 
ation is inserted at the home block of an operation to check for a deferred exception. 

30 One problem with the tagging approach is ttiat the amount of speculation is typically limited by the number of op 
codes available for tags. When more bits are needed to encode the tags, fewer bits are available to enhance the reper- 
toire of operations in the instruction set architecture. Another problem is the need to explicitty dear the information 
stored in the tag when the branch direction skips the commit operation. 

In light of the drawt>acks to tiie above approaches to speculative execution there is a need for an improved method 

35 and hardware support for speculative execution. 

Among tiie drawt>acks highlighted above, one of the contmon drawt>acks is the need for extra op code bits to sup- 
port speculative execution. None of the approaches kncwn to us uses the predicate file to support speculative execu- 
tion. The use of predicates is a well known technique for renxjving conditional branches from a program. A predicate is 
typically comprised of a single bit that controls whether the processor shouM execute the operation associated with it. 

40 In a program that uses predicates, conditional branches are replaced by predicates, which serve as guards to oper- 
ations. Instead of a conditional t>ranch. the value of the precficate bit controls whetfier an operation will execute. In 
effect, the predicate ''replaces'* the conditional t>ranch. The expression or expressions that comprise the conditional 
branch control the value of the predicate bit 

45 SUMMARY OF THE INVENTION 

The invention provides a mettKXi and hardware logic for supporting speculative execution using a speculative look 
aside table. In one embodiment, labels, which serve as indices into a speculative fook aside table, are attached to the 
predicates in a predicate file. The corresponcfing entries in the speculative look aside table store information about 
50 deferred exceptions. For instance, when an exception is generated for a speculative operation, the corresponding entry 
in the speculative look aside table is updated with information about the defenred exception. This information can 
include a single poison bit indicating ttiat exception has occunred as well as additional data about an exception. 

An operation in ttie home basic block of the speculative operation or the speculative chain of operations can be 
used to check for a defenred exception. This operation can b& an explicit check operation or an implicit check operation 
55 implemented as part of a non-speculative operation. During tiiis check operation, the label for a speculative operation 
or chain of operations is decoded, and the conresponding entry in the speculative look aside is checked to determine 
whether an exception has been deferred. If one has been deferred, the exception is either reported or a recovery proc- 
ess Is invoked for recoverat)le exceptions. 

In one specific implementation, the predicate Includes a one bit value that controls execution of an operation, and 
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an n-bit label that serves as an index to the speculative look aside tabia The value of the label also specifies whether 
the operation is speculative. To support speculative execution, two additional operations can be scheduled for a chain 
of speculative operations: a label predicate operation to attach a label to a predicate, and a check operation to check 
whether an exception has been deferred. When a predicate is a target of an operation, the value of the target predicate 

5 is defined by the result of the operation, and the target predicate receives the label of the qualifying predicate. 

When an exception is generated for a speculative operation in this implennentation. a poison bit in the correspond- 
ing entry of tiie speculative look aside table is set. If the execution of the code leads to the home block of the speculative 
operation that generated the exception, then a check operation is executed which reads the corresponding entry in the 
speculative look askle table, tf the poison bit is set the exception is handled at tfiat tima In this implementation, the 

10 processing of the exception is deferred by attaching a label to the associated predicate and storing information about a 
deferred exception in a corresponding entry in a speculative look aside tatsle. 

The approach summarized here has a number of advantages. One advantage to this approach Is that no op code 
bits or additional bits in the general purpose or floating point registers are necessary to support speculative execution. 
Another advantage is that it can SLf)port more than one chain of speculative operations at a time. More operations can 

75 be executed speculatively using ttiis approach. In addition, it can support recovery from deferred exceptions wittiout 
code size explosion. 

Further advantages and features will t}ecome apparent with reference to the folkswing detailed description and 
accompanying drawings. 

20 BRIEF DESCRIPTION OF THE DRAWINGS 

Rg. 1 is an overview diagram of computer system in which the invention can be implemented. 

Rg. 2 Is a general block diagram of a processor In which the invention can be implemerTted. 

Rg. 3 is a block diagram illustrating a speculative look aside tat)le and accompanying logic to support speculative 
25 execution in a processor. 

Rg. 4 Is a flow diagram illustrating the steps executed by tiie processor to perform a lat>el predk:ate operation in an 
embodiment of the invention. 

Rg. 5 is a flow diagram illustrating a process for deferring exceptions using a speculative look aside tattle in an 
errtxxfiment of ttie invention. 

30 Rg. 6 is a flow diagram illustrating how knowledge of a deferred exception can be propagated in an embodiment of 
the invention. 

Rga 7A and 7B are a flow diagram illustrating steps in the process of checking for and recovering from deferred 
exceptions in an embodiment of the Invention. 

35 DETAILED DESCRIPTION 

As an overview, Rg. 1 illustrates a generalized block diagram of a computer system 20 In which an embodiment of 
the invention may be implemented. The computer system 20 includes a CPU 22 coupled to memory 24 and one or 
more peripheral devices 26 via a system bus 28. The system bus 28 carries data and control signals to the CPU 22, 

40 ntenrary 24 and peripheral devices 26. The memory 24 pr6ferat)ly Includes Random Access Memory (RAM), but may 
also be implemented with Read Only Menx)ry (ROM), or a combination of RAM and ROM. The memory 24 stores data 
for one or nrx>re programs that may be executed in the computer system 20. 

Rg. 2 is a general bkx:k diagram of a processor 22 In an embodiment of the Invention. The processor 22 includes 
multiple functional units 30, one or more register files 32, and an instruction unit 34. The register f Oes 32 typk^ally con- 

45 tain several general purpose registers 36 for storing values, addresses and possibly other data. The term "general pur- 
pose reg^ers" can Include f bating point, fixed point, and predk;ate registers, to name a few. 

The architecture of the processor 22 may vary. This particular architecture merely depicts the high level hardware 
design of a processor 22 in one possible embodiment. Speculative execution implemented according to the invention 
can provide peribrmance irrprovement in a variety of CPU designs, including In particular, CPUs with multiple functional 

50 units or CPUs with multiple pipelined functional units. Speculative execution is particularly effective In enhancing per- 
formance in Very l^ng Instruction Word (VLIW) computers. 

In the process of running a program, the CPU 22 cam'es out a series of instructions stored in memory 24. The 
instruction unit 34 fetches an instruction from memory via tiie system bus 28 and then decodes the insfaxiction. Depend- 
ing on the type of CPU and/or the scheduling metiiod used, an insbuction may have more than one operation. The 

55 instruction unit 34 issues operations to a functional unit 30 or to multiple functional units (shown as stacked boxes in 
Rg. 2). The instruction unit 34 sends control signals to a functional unit 30 to carry out tiie operation or operations in an 
Instruction. In response to these confol signals, tiie functional unit 30 reads data such as address or value from the 
appropriate registers in the register file 32 and performs an operation. For some operations, the functional unit 30 writes 
a result back to the register file 32. For a memory store operation, the functional unit 30 reads a memory address and 
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a value stored in the register file 32 and transfers the value directly to menmry 24. 

While the specific structure of the processor can vary, the invention provides support for speculative execution in 
processors that use predicates to control the execution of operations. 

In one embodinnent of the invention, a speculative look aside table (SLAT) is used to support speculative execution. 

5 A label, added to an entry in the predicate register file, serves as an index to the SLAT. The SLAT stores state Informa- 
tion used to defer exceptions generated by speculative operations. In one specific implen^ntation, the SLAT has one 
fewer entry than the number of values that the label can represent because the value of zero is used to represent a non- 
speculative operation. The label, therefore, indicates whether or not an operation is speculative In addition to sennng 
as an index into the SLAT. No additional op code bits are necessary to identify an operation as speculative. 

10 For the purposes of illustration, we describe an implementation of the Invention in a processor having five stages: 
fetch the next operation (F). decode the operation (D), read all registers (R). execute the operation (E), and write the 
targets (W). 

Fig. 3 is a block diagram illustrating the logic in a processor used to support speculative execution. In this imple- 
mentation, the menK>ry for storing predicates In the processor is referred to as the predicate file. The predicate file 50 
75 includes a series of entries (52. 54, 56, for example) for storing a precficate value 58 and a \sbeA 60. The predicate value 
58 (either true or false) indicates whether an operation associated with it should be executed. The label 60 serves as 
an index to a speculative look aside table arvi also indicates whether an operation associated with it is speculative. A 
lat>el of zero, in this implementation. Indicates tiiat the operation is non-speculative, while a non-zero label serves as an 
index to the SLAT 

20 The functional unit of the processor reads and writes predicate values through read ports (62 and 64 for example) 
and write ports (62 and 66) to the predicate file 50. When a predicate is the qualifying or controlling predicate for an 
operation, the functional unit reads the predicate value through read port 62 and the label through read port 64. Con- 
versely when a predicate is a target of an operation, the functional unit writes labels to entries in the predicate file 
through the write port 66 of the predicate file 50. This diagram only provides one possble implementation of the predi- 

25 cate ffle; the specific structure and number of read arxJ write ports can vary. 

The SLAT 80 includes a number of entries (82, 84, and 86 for example) for storing exception data. In this particular 
implementation, each entry has at least one bit. for storing a poison t>it. When the poison bit is set it indicates that an 
exception has been deferred for at least one of the operations corresponding to the SLAT entry. The SLAT 80 shown in 
Fig. 3 also includes a field or fields (90, for example) for storing additional flags. These flags can be used to store data 

30 for processing exceptions. For example, the SLAT entry can include a field for storing status bits used to process excep- 
tions for IEEE floating point operations. The SLAT entry can also store the menwry location of recovery code used in 
the recovery process for certain types of exceptior^. 

The logic of the processor shown in Rg. 3 Includes predicate decoders 1 00. 1 02 used to control access to the pred- 
icate fDe 50. In this particular inplementation. the logic includes at least a first predicate decoder 100 for decoding a 

35 controlling predicate for an operation. The logic also includes at least a second predicate decoder 1 02 for decoding tar- 
get predicates when predicate values and lat>els are written to the predicate fOe. The first predicated decoder 100 is 
identified an "R" in parentheses to illustrate that it is used to read an entry in the predicate f ila The secorxJ predi- 
cate decoder 102 is identified with a "W* in parentheses to shew that it is i^ed to write data to one or more entries In 
the predicate fO& 

40 As introduced above, the functional unit of the processor can read labels from the predicate file 50 through read 
ports 64 as shown in Rg. 3. In this particular implementation, the functional unit includes a number of control units (1 1 0, 
1 1 2, for example) that form part of the execute stage of the processor. These control units include control logic to read 
a label from an entry in the predicate file and submit the label to one of a series of label decoders (120. 122. for exam- 
ple) used to read a SLAT entry. As shown in Fig. 3. the control units used to execute operations in tiie processor can 

45 send a label to a corresponcfing label decoder for the SLAT 

In one embocfiment. the functional unit reads a SLAT entry to determine whether a poison bit is set One reason to 
read the SLAT entry is to determine whether execution of an operation should be halted. For example, if an exception 
has already t>een deferred for a speculative chain of operations, additional operatior^ tiiat correspond to the same 
SLAT entry need not be executed. Another reason is to check whether an exception has been deferred during an explicit 

50 or implicit check operation. The logic for reacting the SLAT entry is illustrated as halt and check logic 1 16 in Fig. 3. While 
shown as one block in Rg. 3. the bgic for halting execution of an operation, and the logic for checking a SLAT to deter- 
mine whether an exception has been deferred can be implemented separately. The operation of tiiis logic is descrit>ed 
in further detail betow. 

The functional unit also includes control logic to write data to an entry in the SLAT As shown in Rg. 3. the functional 
55 unit includes at least one control unit 114 that writes the results of an operation during the write stage of the processor. 
If an exception is generated for a speculative operation, the control unit updates the correspondng enby in the SLAT. 
In one irrplementation. an exception unit 118 sets a poison bit in a corresponding SLAT entry when an exception is gen- 
erated for a speculative operation. As explained further below, the functional unit also writes to a SLAT entry to Initialize 
it during an operation where a label is attached to a predicate. 
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The logic shown in Rg. 3 Includes a series of label decoders (1 20, 1 22, 1 30, for exanple) that control access to the 
SLAT To read data from a SLAT entry, the control logic (110 or 112, for example) In the functional unit issues control 
signals to the label decoders. Specifically in this embodiment the control logic submits a label to the label decoders 
(120 or 122), which In turn, decode the label arxi allow the functional unit to read the contents of the con^esponding 
SLAT entry. To write to a SLAT entry, the functional unit (1 14. for example) sends a SLAT label to a label decoder 130 
used to set or clear a bit or bits in a SLAT entry The exception unit 1 18 then writes data to the SLAT to update a poison 
bit, for example. 

The SLAT includes read ports 150, 152 and write ports 154, 156 to allow the functional unit to read and write data 
to a SLAT entry. The number of ports can vary depending on the Inrplementation. The SLAT shown in Rg. 3 has a read 
port 150 and write port 154 to set and dear a poison bit in a SLAT entry The write port 154 Is used to set the poison bit 
in a SLAT entry when a speculative operatbn generates an exception and to clear the poison bit when the SLAT entry 
is initialized. The read port 154 is used to determine whether an exception has been deferred. For example, in one 
implementation the processor checks the poison bit during the execution phase to determine whether to proceed in exe- 
cuting the current operation. As another example, the processor checks the poison bit in response to an explicit check 
operation to determine whether an exception has been deferred. 

The SLAT can include additional read and write ports 1 52, 1 56 to read and write additional data in intplementations 
where the size of the SLAT entry is larger than one bit. As noted above, the SLAT entry can store data In addition to a 
poison bit such as status bits used to control re-execution of operations. 

To take advantage of the support for speculative execution in the processor, code executed in the processor is first 
optimized i^ng speculative code nrxition. Speculative code motion can be performed by a compiler, manually by the 
assemt)ty language programmer, by the processor hardware, or by sonr>e combination of these methods. 

To support speculative execution using the SLAT, two operations are added on a trace: label predicate operation 
(Iblpred) and a check label operation (chklbl). The label operation is inserted in the target basic block at the beginning 
of a chain of speculative operations. The purpose of the \abe\ operations is to attach a label to a predicate. The check 
lat>el operation is inserted In the home basic blocK and its purpose is to check whether an exception has been deferred 
for any of the speculative operations in the chain. 

An example will help Illustrate how these operations are scheduled. Consider the following source code: 

a = *x; 

if ( a == 0 ) { 
b - *y; * 
C =^ *Z; 
if ( b 0 ) 

d « ♦v; 
else 

e = *W; 

} 



One possit)le version of the assemt)ly code for this example is: 

1. p1?klr7=r9 

2. p1?cmpeq.u p4,p0=r7,0 

3. p4?W r2=(r10) 

4. p4?Wr3=(r11) 

5. p4?cmpeq.u p2,p3=r2.0 

6. p2?ki r4=(r12) 

7. p3?klr5=(r13) 

In this example, the "p_?" represents the qualifying predicate for the operation. The value of this predicate deter- 
mines whether the operation will be executed. The notation, "r_". refers to a register in the register file. The notation, 
"Id," refers to a load operation, while the notation, "cmpeq-u," refers to a compare operation. In a compare operation 
such as "p4?cmpeq.u p2,p3=r2,0", the targets of tiie operation are precficates (p2 and p3 in this example). 

A speculative version of this code for a processor with a SLAT is as follows: 

1.p1?Wr7=r9 
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2. p1?Iblpred.11p4 

3. p4?ld r2=(r10) 

4. p4?ldr3=(r11) 

5. p4?cmpeq.u p2,p3=r2,0 
5 6.p2?ldr4=(r12) 

7. p3?ld r5=(r13) 

8. p1?cmpeq p5,p0=r7.0 

9. p5?chkIbl.11Jmmedp4 

10 As shown in this example, only two additional operations are needed to speculate a chain of operations. This exam- 
ple also illustrates an added benefit of this approach: Speculative versions of the operations are not necessary because 
the label attached to the predicate indicates whether an operation is speculative. 
The label predicate operation in this case has the form: 
pb?lblpred,Lpn 

15 where: 

pb is the qualifying predicate of the target blocK which is p1 in this example. 
L is an n-bit label used to assodate the operations in the chain with a SLAT entry. 

pn is the target predicate. The target predcate {p4 in this example) Is set to the value of the qualifying predicate 
20 and gets the label V. 

When a label predicate operation is inserted, its qualifying predicate is the qual'ifying predicate of the target block. 
The target block refers to the basic block where an operation or chain of operations is moved using speculative code 
motion. The operands of the label predicate operation are the N bit label (L) and the target predicate (pn). 
25 Fig. 4 is a flow cfiagram illustrating the steps executed by the processor to perform a label predicate operation. I=br 
darity, the steps of the flow diagram appear in parentheses with ihe accompanying description in the text below. Rrst, 
the predicate decoder decodes the qualifying predicate in the decode stage (180). 

Next during the read stage, the functional unit reads the predicate value and label from the predicate file (182). For 
the purposes of illustration, we assume that the qualifying predicate for the lat>el predicate operation is true in this 
30 example. If it is not, the lat>elling process halts and the processor moves to the next operation. 

During the execute stage, the new lak>el is initialized for the chain of speculative operations (1 84). In this case, this 
entails setting the corresporxiing entry in the SLAT to zero. 

Next the target predicate is decoded and the value of the qualifying predicate is copied to the target predicate 
(1 86). The lat}el is written to the lat>el field of the target predicate. 
35 By attaching a lak>el to a predicate in this manner, this method obviates the need to tag the operation itself. As a 
result, no op code bits are needed for the label. Rather, the op code bits already made available for predicates are used 
to store the incfices to the SLAT. 

The check operatfon in this example has the fbllcwing format: 
pq?chklbl,L,dpn 

40 where: 

pq is the qualifying predicate of the home block of the speculated code. In this example, p5 is the qualifying predi- 
cate. 

L is the label being checked, which is 1 1 in our example. 
45 d is the program counter (PC) displacement to the fix-up code. 

pn is the predicate specified in the label predicate operation, which is p4 in this example. 

The check operation is inserted in the home block of a speculative operation or sequence of operations to check 
whether an exception has been deferred. During a check operation, the processor reads the SLAT entry corresponding 
50 to the label. If the poison bit is set. the processor either reports the exceptfon or takes steps to recover from the execu- 
tion. To recover from an exceptfon, the processor uses the PC displacement value to jump to the appropriate fix-up 
code. 

In one alternative implementation of the SLAT the need fa an explicit check operation can t>e avoided. In th^ 
implementation, the lat)el operation encodes the location of the starting point of the recovery code in the corresponding 
55 SLAT entry. Certain non-speculative operations in the instructions set such as a memory store for example, are then 
used to check for deferred exceptions. For instance, a non-speculative operation located in the home basic block can 
make an implicit check for a deferred exception, rather than explicit check described above. When such a non-specula- 
tive operation is executed under the control of a predicate with a non-zero label, an exception is generated if tiie poison 
bit is not zero. An exception handler then uses the address field in the corresponding SLAT entry to locate the recovery 
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code. 

Before describing the check operation in more detail, we first descrS>e hew the SLAT can t>e used to defer excep- 
tions for speculative operations. 

Rg. 5 is a flow diagram Illustrating a process for deferring exceptions using a SLAT. Assuming a pipelined processor 
with 5 stages as set forth above, the processor begins by fetching the next operation. During the decode stage, the 
processor inserts the number of the controlling predicate for the operatfon into the predicate decoder 1 00, and the pred- 
icate decoder locates the corresporKting entry in the predicate file (200). 

During the read stage, the functional unit (1 10, for example) reads the 1 bit predicate value and the n bit label from 
the predicate file (202). If the value of the predicate is zero, then the operation does not have to t>e executed. For this 
example, we assume that the predicate is true. It should be noted that the execution of the operation can be halted at 
any point before the write stage, the stage at which the operation can change the state of the processor. In some cases 
for example, the value of the predicate may not be kncwn until later because rt is being computed as the current oper- 
ation proceeds through the pipeline process. In these circumstances, the execution of the processor may be halted at 
a later stage if the predicate tums out to be false. 

In addition to reading the value of the predicate, the processor also reads the label. Since a label of zero identifies 
a non-speculative operation, the processor knows at this stage whether the operation is speculative. For the purpose 
of illustration, we assume that the operation is speculative in the example shown in Rg. 5. If it is not, this simply means 
that an exception, if generated, will be handled Immediately rather than defended. 

During the execute stage, control logic 11 0 in the functional unit feeds the lat>el to a label decoder 1 20, which then 
decodes the label (204). As the operation is executed, the processor reads the SLAT entry (206). If the poison bit is set, 
the processor can turn off any multiK^ycle operation and skip to the next operation (208, 210). 

If an exception Is generated for a speculative operation (212), the processor writes data to the corresponding entry 
in the SLAT to indicate that an exception is being defenred. In this implementation, the exception unit 118 sets the poi- 
son bit (214). If the St^T entries are wider than one bit, then additional data such as status bits or the address of fix-up 
code can be written to the SLAT entry. 

For a floating point exception, for example, the processor could write IEEE mandated sticky bits to the Indicated 
SLAT entry. To support this feature, adcfitional ports are added to the SLAT and another operation is used to set these 
bits in the appropriate SLAT entry. Upon detection of a deferred exception, the check predicate operation transfers them 
into a Roating Point Status Register. Specifically, the check operatfon ORs these bits into tfie Roating Point Status Reg- 
ister. The label predicate operation clears these bits when It initializes a SLAT entry. 

In the implementation shown in Rg. 3, the processor sets the poison bit during the write ^ge when an exception 
has been detected. The control logic in the write stage 114 feeds the label to the label decoder 130, whfoh in tum 
locates the proper SLAT entry. 

If an exception is not generated (212), tiie processor writes the results of the operation to the appropriate result reg- 
ister (216) and proceeds to the next operation. 

Rg. 5 specifically Illustrates how the St^T is updated when an exception has been generated. To support specu- 
lative execution with the SLAT, the processor also has to propagate knowledge that an exception has occun-ed. An 
exception is propagated when the label for tine SLAT entry hokling a poison bit Is copied to another predcate. For 
instance, an exception can be propagated when a predicate is the target of an operation such as a compare operation. 
The value of the target precficate is ddined by tiie result of the operation. In addition, the label of the qualifying predicate 
is copied to the of the target predicate. 

In one specific embodiment, all of the predicates are set to true and all of the labels are set to zero k>efore a program 
is started. To Irrplennent this approach, one predicate. pO for example. Is permanentty true with a \abe\ of 0. 

Rg. 6 is a flow diagram illustrating how krK>wledge of an exception can be propagated. The process t>egins in a sim- 
ilar fashfon to the process illustrated in Rg. 5. The predicate decoder 100 (Fig. 3) decodes the controlling predk^ate 
(220) during the decode phase, and the control logic associated with the operation reads the predicate value and label 
during the read stage (222). During the execute stage, the processor computes tiie results of the operation (224). 

The processor then writes the results of tiie operation to the predicate terget or targets (226) during the write stage. 
In this implementation, ttie predicate decoder 102 decodes the number for the target predicate or predicates, and the 
processor writes the predfoate value and \aM to the appropriate entries. 

In addition to generating and propagating exception data, the processor also has a means to detect deferred 
exceptions. As introduced above, one way to detect a deferred exception is to insert an explicit check operation in tiie 
home basic bfock. Another way is to use an inrplfoit check operation in the home basic block, such as the non-specula- 
tive operation (memory store) desaibed atx>ve. 

Another aspect of the support for speculative execution using the SLAT is how the processor performs recovery. 
Recovery is the process for re-executing operations when a recoverable exception is detected In response to an explicit 
or Implictt check operation. When the processor completes re-execution of operations in the recovery process, it then 
resumes normal execution. Below, we descrfoe the process of recovery and resumption in more detail. 

Rgs. 7A and 7B are a flow diagram illustrating steps for checking tor and recovering from a deferred exception. This 
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example refers to the case where an explicit check operation is used to check for a deferred exception. At the decode 
stage, the predicate decoder decodes the \abe\ for the controlling predicate (also refen-ed to as the qualifying predicate, 
pq) of the check operation (240). During the read stage, the processor reads the controlling predicate value and label 
(242). If the controlling predicate value is not true (244), the rest of the check operation Is skipped and the processor 
proceeds to the next operation. 

If the controlling predicate value is true and the controlling predicate's label is non-zero (248), the processor reports 
a non-reccverable check exception (250). 

If the controlling predicate value is true and the controlling predicate's label is zero (248). the processor decodes 
the label (L) specified in the check operation (252) and reads the corresponding SLAT entry (254). 

If the poison bit is not set (256), the processor proceeds to ttie next operation (258). When the poison bit is not set, 
no exceptions have been deferred for the speculative operations associated with the SLAT entry. As such, there are no 
exceptions to handle and processing proceeds with the next operation. 

If the poison bit is set, the processor handles the exception. This can entail reporting the exception, if the exception 
is non-recoverable, or re-executing operations, if the exception is recoverable. Figs. 7A and 7B illustrate the specific 
case where the exception is recoverable. 

In one implementation of the recovery process, the processor jumps to fix-up code based on the displacement 
value (d) specified in the explicit check operation (260). The processor takes a recovery fault to generic code that: 

1) saves all predicates (and possibly branch triggers) (262); 

2) sets all predicates to talse except for the target predicate specified in the check operation (generally pn, and spe- 
cifically p4 in the example above) and the source predicate (p1 . in the example above) (264); and 

3) sets the label of the source predicate to 0 (266). 

As an adcfitional step the recovery process can also include clearing the poison bit of the SLAT. However, this can 
also be performed by tiie label predicate operation when it initializes a SLAT entry. 

At the end of the recovery process, the processor executes a series of steps to resume nomud operatbn. In the 
process of resumption, the state of the processor is reset so that sut>sequent operations execute normally. In one 
embodiment, the process of resunrption is triggered in resportse to a check operation. Specifically, resumption occurs 
in response to a check operation where: 

1) the value of the qualifying predicate is true; 

2) the label of the qualifying predk:ate is zero; and 

3) the lat)el of the source predicate is zero. 

When these conditions are met during recovery, the processor brarx^hes to generic code that restores the predi- 
cates (and possit)ly branch triggers), and resumes at the faulting check operation. 

In the method desail>ed atxjve, an explicit operation is not necessary to dear the state of the lal>el. When a pred- 
icate is a target of an operation, the label field of the target predicate is automatically reset. As noted above, the label 
predicate operation takes care of resetting the corresponding SLAT entry to zero. 

While we have desabed our invention with reference to specific entodiments, we do not intend to limit the scope 
of our invention to these embodiments. The SLAT and accompanying logic can implemented in a variety of ways. For 
example, the specif k; design of control logc used to read and write predicate data from tiie predicate file can vary. In 
addition, the control logic for reading and writing to SLAT entries can also vary. The size of the SLAT table and the for- 
mat of the data stored in it can also vary. 

Having described and illustrated the princqDies of our invention with reference to a prefenred embodiment and sev- 
eral alternative emtxxiiments, it shouM be apparent that the invention can be nxxiif ied in anangement and detail with- 
out departing from its principles. Accordingly, we daim all modifications as may come witiiin the scope and spirit of the 
following claims. 

Claims 

1. In a processor (22). logic for supporting speculative execution comprising: a speculative k>ok aside table (80) 
induding a plurality of entries (82-86) for storing data indicating whether an exception is being deferred; a predicate 
register file (50) induding a plurality of entries (52-56) operable to store a predicate (58) for controlling execution of 
one or wore operations and a lat>el (60) representing an index into the speculative look aside tat)le; a predicate 
decoder operable to read a predicate and to locate a corresponding entry in the predicate register file; control logic 
(11 0. 11 2) operat)le to read a predicate value and a label from the corresponding entry in the predicate register file; 
and a \8be\ decoder (1 20. 1 22) communicative witii the confrol logic, the label decoder operable to receive the label 
from ttie comrd logic (1 10, 1 12) and operable to locate a corresponding entry in the speculative look aside tat)le 
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(80). 

2. The logic of claim 1 wherein the pluralrty of entries in the predicate register file include a first bit field (58) for storing 
a 1 bit predicate, and an n-brt label (60) for storing the index into the speculative look aside table (80). where n is 

5 an integer greater than or equal to two. 

3. The logic of daim 1 wherein the processor (22) includes a decode stage in which an operation is decoded, a read 
stage in which one or more operands of the operation are read, an execute stage in which the operation is exe- 
cuted. arxJ a write stage in which a result of the operation is written to a target register; wherein the control logic 

10 (110. 112) is operable to issue the label from the corresponding entry in the predicate register file (50) to the label 
decoder during the execute stage; and wherein the logic irv:ludes halt logic (1 16) operable to read the correspond- 
ing entry in the speculative look aside tat)le during the execute stage, and operable to halt execution of the opera- 
tion when a poison bit is set in the corresporxf ing entry in the speculative look aside table. 

15 4. The logic of claim 1 further including check logic (116) communicative with the speculative kx)k aside table (80). 
the check logic operat)le to read the corresporKling entry in the speculative look aside table and operable top initiate 
recovery from an exception when a poison bit is set 

5. The logic of daim 1 further induding a second label decoder (130). the second label decoder operable to control 
20 writing a poison bit to the corresponding entry in the speculative look aside table (80). 

6. The logic of daim 6 wherein the processor (22) indudes a decode stage in which an operation is decoded, a read 
stage in which one or more operands of the operation are read, an execute stage in which the operation is exe- 
cuted, arxj a write stage in which a result of the operation is written to a target register; and wt^rein the second 

25 label decoder (130) is operable to control writing the poison bit to the corresponding entry in the speculative look 
aside table during the write stage. 

7. A method for supporting speculative ^cecution using a speculative look aside taSoUe (80). the method comprising: 
decoding a predicate for an operation (100, 200); reading the predicate and associated label (64. 202); contrdling 

30 execution of the operation based on a value of the predcate; decoding the label (1 20, 204); reading an entry in the 
speculative look aside tat)le (80) corresponding to the lat>el to d^ermine whether an exception has been deferred 
(206); and t>ased on the entry in the speculative look aside table, controlling further execution of the operation. 

8. The method of daim 7 further induding: reading the label to determine whether the operation is a speculative oper- 
as ation (64, 202); setting a poison t)it in the entry of the speculative kx>k aside table corresponding to the label when 

the operation is speculative and generates an exception (118, 214). 

9. The method of daim 7 further induding: in an explicit check operation, detemrtining whether the entry correspond- 
ing to a label specified in the explicit check operation is storing a flag indicating that an exception has k>een deferred 

40 (252. 254); and when the flag is detected, tMfanching to re-execution code including operations qualified witti pred- 
icates having the lak>el specified in the check operation (260-266). 

10. In a processor (22) having a decode stage in which an operation is decoded, a read stage in which one or more 
operarxte of the operation are read, an execute stage in which the operation is executed, arxi a write stage in which 

45 a result of the operation is written to a target register, logic for supporting speculative execution comprising : a spec- 
ulative look aside table (80) induding a plurality of entries (82-86) for storing a poison bit indicating whether an 
exception is being defended; a predicate register file (50) induding a plurality of entries (52-56) operable to store a 
predicate value for contrdling execution of one or more operations arvi a label representing an index into the spec- 
ulative look aside tatHe (80); a first predicate decoder 100 operable to read a predicate and to locate a con-espond- 

50 ing entry in the predicate file (50) during the read stage of tfie processor (22); a second predicate decoder (102) 
operable to control writing of results of an operation on a predicate, induding a label, to one or nrK>re entries in the 
predicate file; control logic (110,112) operable to read a predicate value and a label from the corresponding entry 
in the predicate file (50) and to issue the lat>el to a first label decoder (1 20); the first label decoder (120) communi- 
cative with the control logic (1 10) to receive the lat>el from the control logic, and operable to read a corresponding 

55 entry in the speculative took aside table (80); and a second label decoder (1 30) communicative with the speculative 
took aside table (80), the second lat>el decoder operak)le to corrtrd writing a poison bit to the corresponding entry 
in the speculative look aside table. 
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